[flang][OpenMP] Basic mapping of `do concurrent ... reduce` to OpenMP #146033

ergawy · 2025-06-27T07:06:01Z

Now that we have changes introduced by #145837, mapping reductions from do concurrent to OpenMP is almost trivial. This PR adds such mapping.

PR stack:

[flang][do concurrent] Re-model reduce to match reductions are modelled in OpenMP and OpenACC #145837
[NFC][flang] Move ReductionProcessor to Lower/Support. #146025
[flang][fir] Small clean-up in fir_DoConcurrentLoopOp's defintion #146028
[flang][OpenMP] Basic mapping of do concurrent ... reduce to OpenMP #146033 (this one)

llvmbot · 2025-06-27T07:06:32Z

@llvm/pr-subscribers-flang-fir-hlfir

Author: Kareem Ergawy (ergawy)

Changes

Now that we have changes introduced by #145837, mapping reductions from do concurrent to OpenMP is almost trivial. This PR adds such mapping.

TODO: Add tests

Full diff: https://github.com/llvm/llvm-project/pull/146033.diff

1 Files Affected:

(modified) flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp (+56-27)

diff --git a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
index 709cf1d0938fa..31076f6eb328f 100644
--- a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
+++ b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
@@ -312,6 +312,19 @@ class DoConcurrentConversion
               bool isComposite) const {
     mlir::omp::WsloopOperands wsloopClauseOps;
 
+    auto cloneFIRRegionToOMP = [&rewriter](mlir::Region &firRegion,
+                                           mlir::Region &ompRegion) {
+      if (!firRegion.empty()) {
+        rewriter.cloneRegionBefore(firRegion, ompRegion, ompRegion.begin());
+        auto firYield =
+            mlir::cast<fir::YieldOp>(ompRegion.back().getTerminator());
+        rewriter.setInsertionPoint(firYield);
+        rewriter.create<mlir::omp::YieldOp>(firYield.getLoc(),
+                                            firYield.getOperands());
+        rewriter.eraseOp(firYield);
+      }
+    };
+
     // For `local` (and `local_init`) opernads, emit corresponding `private`
     // clauses and attach these clauses to the workshare loop.
     if (!loop.getLocalVars().empty())
@@ -326,50 +339,65 @@ class DoConcurrentConversion
           TODO(localizer.getLoc(),
                "local_init conversion is not supported yet");
 
-        auto oldIP = rewriter.saveInsertionPoint();
+        mlir::OpBuilder::InsertionGuard guard(rewriter);
         rewriter.setInsertionPointAfter(localizer);
+
         auto privatizer = rewriter.create<mlir::omp::PrivateClauseOp>(
             localizer.getLoc(), sym.getLeafReference().str() + ".omp",
             localizer.getTypeAttr().getValue(),
             mlir::omp::DataSharingClauseType::Private);
 
-        if (!localizer.getInitRegion().empty()) {
-          rewriter.cloneRegionBefore(localizer.getInitRegion(),
-                                     privatizer.getInitRegion(),
-                                     privatizer.getInitRegion().begin());
-          auto firYield = mlir::cast<fir::YieldOp>(
-              privatizer.getInitRegion().back().getTerminator());
-          rewriter.setInsertionPoint(firYield);
-          rewriter.create<mlir::omp::YieldOp>(firYield.getLoc(),
-                                              firYield.getOperands());
-          rewriter.eraseOp(firYield);
-        }
-
-        if (!localizer.getDeallocRegion().empty()) {
-          rewriter.cloneRegionBefore(localizer.getDeallocRegion(),
-                                     privatizer.getDeallocRegion(),
-                                     privatizer.getDeallocRegion().begin());
-          auto firYield = mlir::cast<fir::YieldOp>(
-              privatizer.getDeallocRegion().back().getTerminator());
-          rewriter.setInsertionPoint(firYield);
-          rewriter.create<mlir::omp::YieldOp>(firYield.getLoc(),
-                                              firYield.getOperands());
-          rewriter.eraseOp(firYield);
-        }
-
-        rewriter.restoreInsertionPoint(oldIP);
+        cloneFIRRegionToOMP(localizer.getInitRegion(),
+                            privatizer.getInitRegion());
+        cloneFIRRegionToOMP(localizer.getDeallocRegion(),
+                            privatizer.getDeallocRegion());
 
         wsloopClauseOps.privateVars.push_back(op);
         wsloopClauseOps.privateSyms.push_back(
             mlir::SymbolRefAttr::get(privatizer));
       }
 
+    if (!loop.getReduceVars().empty()) {
+      for (auto [op, byRef, sym, arg] : llvm::zip_equal(
+               loop.getReduceVars(), loop.getReduceByrefAttr().asArrayRef(),
+               loop.getReduceSymsAttr().getAsRange<mlir::SymbolRefAttr>(),
+               loop.getRegionReduceArgs())) {
+        auto firReducer =
+            mlir::SymbolTable::lookupNearestSymbolFrom<fir::DeclareReductionOp>(
+                loop, sym);
+
+        mlir::OpBuilder::InsertionGuard guard(rewriter);
+        rewriter.setInsertionPointAfter(firReducer);
+
+        auto ompReducer = rewriter.create<mlir::omp::DeclareReductionOp>(
+            firReducer.getLoc(), sym.getLeafReference().str() + ".omp",
+            firReducer.getTypeAttr().getValue());
+
+        cloneFIRRegionToOMP(firReducer.getAllocRegion(),
+                            ompReducer.getAllocRegion());
+        cloneFIRRegionToOMP(firReducer.getInitializerRegion(),
+                            ompReducer.getInitializerRegion());
+        cloneFIRRegionToOMP(firReducer.getReductionRegion(),
+                            ompReducer.getReductionRegion());
+        cloneFIRRegionToOMP(firReducer.getAtomicReductionRegion(),
+                            ompReducer.getAtomicReductionRegion());
+        cloneFIRRegionToOMP(firReducer.getCleanupRegion(),
+                            ompReducer.getCleanupRegion());
+
+        wsloopClauseOps.reductionVars.push_back(op);
+        wsloopClauseOps.reductionByref.push_back(byRef);
+        wsloopClauseOps.reductionSyms.push_back(
+            mlir::SymbolRefAttr::get(ompReducer));
+      }
+    }
+
     auto wsloopOp =
         rewriter.create<mlir::omp::WsloopOp>(loop.getLoc(), wsloopClauseOps);
     wsloopOp.setComposite(isComposite);
 
     Fortran::common::openmp::EntryBlockArgs wsloopArgs;
     wsloopArgs.priv.vars = wsloopClauseOps.privateVars;
+    wsloopArgs.reduction.vars = wsloopClauseOps.reductionVars;
     Fortran::common::openmp::genEntryBlock(rewriter, wsloopArgs,
                                            wsloopOp.getRegion());
 
@@ -393,7 +421,8 @@ class DoConcurrentConversion
                              clauseOps.loopLowerBounds.size())))
       rewriter.replaceAllUsesWith(loopNestArg, wsloopArg);
 
-    for (unsigned i = 0; i < loop.getLocalVars().size(); ++i)
+    for (unsigned i = 0;
+         i < loop.getLocalVars().size() + loop.getReduceVars().size(); ++i)
       loopNestOp.getRegion().eraseArgument(clauseOps.loopLowerBounds.size());
 
     return loopNestOp;

tblah

LGTM but please could you add a test that covers every region.

ergawy · 2025-06-30T05:28:31Z

LGTM but please could you add a test that covers every region.

Thanks for the review. Done.

mjklemm

LGTM

clementval

LGTM

…lled in OpenMP and OpenACC This PR proposes re-modelling `reduce` specifiers to match OpenMP and OpenACC. In particular, this PR includes the following: * A new `fir` op: `fir.delcare_reduction` which is identical to OpenMP's `omp.declare_reduction` op. * Updating the `reduce` clause on `fir.do_concurrent.loop` to use the new op. * Re-uses the `ReductionProcessor` component to emit reductions for `do concurrent` just like we do for OpenMP. To do this, the `ReductionProcessor` had to be refactored to be more generalized. * Upates mapping `do concurrent` to `fir.loop ... unordered` nests using the new reduction model. Unfortunately, this is a big PR that would be difficult to divide up in smaller parts because the bottom of the changes are the `fir` table-gen changes to `do concurrent`. However, doing these MLIR changes cascades to the other parts that have to be modified to not break things. This PR goes in the same direction we went for `private/local` speicifiers. Now the `do concurrent` and OpenMP (and OpenACC) dialects are modelled in essentially the same way which makes mapping between them more trivial, hopefully. PR stack: - llvm#145837 (this one) - llvm#146025 - llvm#146028 - llvm#146033

With llvm#145837, the `ReductionProcessor` component is now used by both OpenMP and `do concurrent`. Therefore, this PR moves it to a shared location: `flang/Lower/Support`. PR stack: - llvm#145837 - llvm#146025 (this one) - llvm#146028 - llvm#146033

Re-organizes the op definition a little bit and removes a method that does not add much value to the API. PR stack: - llvm#145837 - llvm#146025 - llvm#146028 (this one) - llvm#146033

Now that we have changes introduced by llvm#145837, mapping reductions from `do concurrent` to OpenMP is almost trivial. This PR adds such mapping. PR stack: - llvm#145837 - llvm#146025 - llvm#146028 - llvm#146033 (this one)

…lled in OpenMP and OpenACC (#145837) This PR proposes re-modelling `reduce` specifiers to match OpenMP and OpenACC. In particular, this PR includes the following: * A new `fir` op: `fir.delcare_reduction` which is identical to OpenMP's `omp.declare_reduction` op. * Updating the `reduce` clause on `fir.do_concurrent.loop` to use the new op. * Re-uses the `ReductionProcessor` component to emit reductions for `do concurrent` just like we do for OpenMP. To do this, the `ReductionProcessor` had to be refactored to be more generalized. * Upates mapping `do concurrent` to `fir.loop ... unordered` nests using the new reduction model. Unfortunately, this is a big PR that would be difficult to divide up in smaller parts because the bottom of the changes are the `fir` table-gen changes to `do concurrent`. However, doing these MLIR changes cascades to the other parts that have to be modified to not break things. This PR goes in the same direction we went for `private/local` speicifiers. Now the `do concurrent` and OpenMP (and OpenACC) dialects are modelled in essentially the same way which makes mapping between them more trivial, hopefully. PR stack: - #145837 (this one) - #146025 - #146028 - #146033

…ns are modelled in OpenMP and OpenACC (#145837) This PR proposes re-modelling `reduce` specifiers to match OpenMP and OpenACC. In particular, this PR includes the following: * A new `fir` op: `fir.delcare_reduction` which is identical to OpenMP's `omp.declare_reduction` op. * Updating the `reduce` clause on `fir.do_concurrent.loop` to use the new op. * Re-uses the `ReductionProcessor` component to emit reductions for `do concurrent` just like we do for OpenMP. To do this, the `ReductionProcessor` had to be refactored to be more generalized. * Upates mapping `do concurrent` to `fir.loop ... unordered` nests using the new reduction model. Unfortunately, this is a big PR that would be difficult to divide up in smaller parts because the bottom of the changes are the `fir` table-gen changes to `do concurrent`. However, doing these MLIR changes cascades to the other parts that have to be modified to not break things. This PR goes in the same direction we went for `private/local` speicifiers. Now the `do concurrent` and OpenMP (and OpenACC) dialects are modelled in essentially the same way which makes mapping between them more trivial, hopefully. PR stack: - llvm/llvm-project#145837 (this one) - llvm/llvm-project#146025 - llvm/llvm-project#146028 - llvm/llvm-project#146033

With #145837, the `ReductionProcessor` component is now used by both OpenMP and `do concurrent`. Therefore, this PR moves it to a shared location: `flang/Lower/Support`. PR stack: - #145837 - #146025 (this one) - #146028 - #146033

… (#146025) With #145837, the `ReductionProcessor` component is now used by both OpenMP and `do concurrent`. Therefore, this PR moves it to a shared location: `flang/Lower/Support`. PR stack: - llvm/llvm-project#145837 - llvm/llvm-project#146025 (this one) - llvm/llvm-project#146028 - llvm/llvm-project#146033

…146028) Re-organizes the op definition a little bit and removes a method that does not add much value to the API. PR stack: - #145837 - #146025 - #146028 (this one) - #146033

Now that we have changes introduced by #145837, mapping reductions from `do concurrent` to OpenMP is almost trivial. This PR adds such mapping.

…defintion (#146028) Re-organizes the op definition a little bit and removes a method that does not add much value to the API. PR stack: - llvm/llvm-project#145837 - llvm/llvm-project#146025 - llvm/llvm-project#146028 (this one) - llvm/llvm-project#146033

…` to OpenMP (#146033) Now that we have changes introduced by #145837, mapping reductions from `do concurrent` to OpenMP is almost trivial. This PR adds such mapping. PR stack: - llvm/llvm-project#145837 - llvm/llvm-project#146025 - llvm/llvm-project#146028 - llvm/llvm-project#146033 (this one)

llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir labels Jun 27, 2025

ergawy marked this pull request as draft June 27, 2025 07:30

ergawy force-pushed the users/ergawy/convert_locality_specs_to_clauses_8 branch from b99122b to c99a502 Compare June 27, 2025 14:17

ergawy requested review from skatrak, tblah, mjklemm and bhandarkar-pranav June 27, 2025 14:18

ergawy marked this pull request as ready for review June 27, 2025 14:20

ergawy requested review from clementval and jeanPerier June 27, 2025 14:27

tblah approved these changes Jun 27, 2025

View reviewed changes

ergawy force-pushed the users/ergawy/convert_locality_specs_to_clauses_7 branch from 5f665c9 to 105ac7a Compare June 30, 2025 04:19

ergawy force-pushed the users/ergawy/convert_locality_specs_to_clauses_8 branch from c99a502 to 34a5df1 Compare June 30, 2025 04:23

ergawy force-pushed the users/ergawy/convert_locality_specs_to_clauses_7 branch from 105ac7a to 3e2bdf5 Compare June 30, 2025 04:40

ergawy force-pushed the users/ergawy/convert_locality_specs_to_clauses_8 branch from 34a5df1 to 185b43f Compare June 30, 2025 05:28

mjklemm approved these changes Jul 1, 2025

View reviewed changes

ergawy force-pushed the users/ergawy/convert_locality_specs_to_clauses_7 branch from 3e2bdf5 to 15ed831 Compare July 2, 2025 15:42

ergawy force-pushed the users/ergawy/convert_locality_specs_to_clauses_8 branch from 185b43f to 8c25fe7 Compare July 2, 2025 15:47

clementval approved these changes Jul 6, 2025

View reviewed changes

ergawy force-pushed the users/ergawy/convert_locality_specs_to_clauses_7 branch from 15ed831 to 53f9c0d Compare July 9, 2025 07:18

ergawy force-pushed the users/ergawy/convert_locality_specs_to_clauses_8 branch from 8c25fe7 to 39e6ed7 Compare July 9, 2025 07:18

ergawy force-pushed the users/ergawy/convert_locality_specs_to_clauses_7 branch from 53f9c0d to 8e67de7 Compare July 11, 2025 04:41

ergawy force-pushed the users/ergawy/convert_locality_specs_to_clauses_7 branch from 8e67de7 to df63fea Compare July 11, 2025 05:45

Base automatically changed from users/ergawy/convert_locality_specs_to_clauses_7 to main July 11, 2025 06:30

ergawy added 2 commits July 11, 2025 01:31

[flang][OpenMP] Basic mapping of do concurrent ... reduce to OpenMP

e018c9e

Now that we have changes introduced by #145837, mapping reductions from `do concurrent` to OpenMP is almost trivial. This PR adds such mapping.

Add all regions test.

c1834bb

ergawy force-pushed the users/ergawy/convert_locality_specs_to_clauses_8 branch from 39e6ed7 to c1834bb Compare July 11, 2025 06:31

ergawy merged commit 0e9b7b0 into main Jul 11, 2025
9 checks passed

ergawy deleted the users/ergawy/convert_locality_specs_to_clauses_8 branch July 11, 2025 07:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[flang][OpenMP] Basic mapping of `do concurrent ... reduce` to OpenMP #146033

[flang][OpenMP] Basic mapping of `do concurrent ... reduce` to OpenMP #146033

ergawy commented Jun 27, 2025 •

edited

Loading

Uh oh!

llvmbot commented Jun 27, 2025

Uh oh!

tblah left a comment

Uh oh!

ergawy commented Jun 30, 2025

Uh oh!

mjklemm left a comment

Uh oh!

clementval left a comment

Uh oh!

Uh oh!

Uh oh!

[flang][OpenMP] Basic mapping of do concurrent ... reduce to OpenMP #146033

[flang][OpenMP] Basic mapping of do concurrent ... reduce to OpenMP #146033

Conversation

ergawy commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jun 27, 2025

Uh oh!

tblah left a comment

Choose a reason for hiding this comment

Uh oh!

ergawy commented Jun 30, 2025

Uh oh!

mjklemm left a comment

Choose a reason for hiding this comment

Uh oh!

clementval left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

[flang][OpenMP] Basic mapping of `do concurrent ... reduce` to OpenMP #146033

[flang][OpenMP] Basic mapping of `do concurrent ... reduce` to OpenMP #146033

ergawy commented Jun 27, 2025 •

edited

Loading